transformers.LineByLineTextDataset (deprecated)

huggingface/datasetsに置き換えられた

This dataset will be removed from the library soon, preprocessing should be handled with the 🤗 Datasets library.

FutureWarningのメッセージでも案内されている

__len__と__getitem__が実装されている

1つ1つの要素は辞書 {'input_ids': tensor([0, ..., 2]}

torch.int64（トークンのid）